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Abstract 

Expansion of the neocortex is a hallmark of human evolution. However, it remains an 
open question what adaptive mechanisms facilitated its expansion. Here we show, using 
gyrencephaly index (GI) and other physiological and life-history data for 102 mammalian 
species, that gyrencephaly is an ancestral mammalian trait. We provide evidence that the 
evolution of a highly folded neocortex, as observed in humans, requires the traversal of 
a threshold of ~10 9 neurons, and that species above and below the threshold exhibit a 
bimodal distribution of physiological and life-history traits, establishing two phenotypic 
groups. We identify, using discrete mathematical models, proliferative divisions of pro- 
genitors in the basal compartment of the developing neocortex as evolutionarily necessary 
and sufficient for generating a fourteen-fold increase in daily prenatal neuron production 
and thus traversal of the neuronal threshold. Finally, using RNA-seq data from fetal human 
neocortical germinal zones, we show a genomic correlate to the neuron threshold in the 
differential conservation of long intergenic non-coding RNA. 
(see arXiv: 1304.5412) 
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Introduction 



Development of the human neocortex involves a lineage of neural stem and progenitor cells 
that forms a proliferative region along the ventricular epithelium. The proliferation of cells 
within this region expands the neocortex by increasing neuron number. At the onset of mam- 
malian cortical neurogenesis, neuroepithelial cells transform into radially oriented apical radial 
glia (aRG), which proliferate extensively at the apical surface of the ventricular zone and divide 
asymmetrically to self-renew and generate a neuron, intermediate progenitor (IP), or basal radial 



glia (bRG) (Franco and Miiller 2013). IP cells delaminate from the apical surface and translo- 
cate their nucleus to the basal region of the ventricular zone (VZ) to form a second germinal 
layer, the subventricular zone (SVZ), where they divide symmetrically to generate two neurons 
( |Noctor et alj |200"T[ pyfiyata et al.[ |2004t |Haubensak et al.[ |2004l ). Similarly to aRG cells at the 



ventricular surface, bRG cells divide asymmetrically, albeit abventricularly (Fietz et al. 2010 



Hansen et alj |2010| |Shitamukai et al.| |2011t |Wang et alj |2011[ ); but contrary to aRG cells, 
bRG in the human may both divide symmetrically and generate neurons via transit-amplifying 
progenitors (TAPs), a cell-type that is not observed to originate basally in the mouse ( |Hansen 
et al. 2010[ ). Furthermore, bRG, which maintain a single fiber ascending only to the basal sur 



face in the mouse, may also be non-polar or bipolar in the macaque (Betizeau et al. 2013 ). The 
abventricular expansion of progenitors during cortical neurogenesis in humans further compart- 
mentalizes the basal region into an inner (ISVZ) and outer SVZ (OSVZ), driving the radial 
fibers to have divergent, rather than parallel, trajectories to the cortical plate, and thus creating 
the folded cortical pattern observed in gyrencephalic species through the tangential expansion 
of migrating neurons (| Smart et"aL| |2002| |Borrell and Reillo[ |2012^ |Lewitus et al.| |2013| ). For 
this reason, and based on supporting evidence obtained in the gyrencephalic human and ferret 
and lissencephalic mouse, it was originally thought that an abundance of asymmetrically divid- 
ing bRG cells in the outer SVZ was an evolutionary determinant for establishing a relatively 
large and gyrencephalic neocortex ( |Fietz et aTj 2010 [Hansen et aL} 2010 [Reillo et al.[ 2011[ ). 
But recent work in the lissencephalic marmoset (Callithrix jacchus) has shown that bRG cells 
may, in fact, exist in comparable abundance in both gyrencephalic and lissencephalic species 
( |Garcf a- Mor eno et al.[ |2012| |Kelava et al.[ |2012| ) and so cannot alone be sufficient for either 
establishing or increasing cortical gyrification. Thus, despite considerable progress in the study 
of brain size evolution ( |Finlay and D arlington] |1995[ |Krubitzer and Kaasj |2005| |Hager et al"} 



2012), the adaptive mechanism that has evolved along certain mammalian lineages to produce 
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Figure 1: Ancestral reconstruction of Gl values for 102 mammalian species. Gl values were determined as 
illustrated in Figure SI for the species listed in Table SI. Reconstructed Gl values for putative ancestors are 
presented at selected internal nodes of the phylogenetic tree. MYA, million years ago; colors indicate taxonomic 
groups. Images of Nissl-stained coronal sections of representative species for each taxonomic group, downloaded 
from http://brainmuseum.org, along with respective Gl values, are shown on the right. 

a large and folded neocortex is not known. 

In this study, we analyzed physiological and life-history data from 102 mammalian species 
(Table SI; Table S2; External Database 1). We show that a gyrencephalic neocortex is ancestral 
to all mammals (Figure 1) and that Gl (Figure SI), like brain size, has increased and decreased 
along many mammalian lineages. These changes may be reliably characterized by convergent 
adaptations into two distinct physiological and life-history programs (Figure 2a), resulting in a 
bimodal distribution of mammalian species (Figure 2b) with a robust threshold value for both 
Gl and neuron number (Figure 3). Traversal of the threshold requires greater neuron production 
per gestation day (Figure 4a,b), which we argue is necessitated by the evolution of increased 
proliferative potential in SVZ progenitors during cortical neurogenesis (Figure 5). Using fetal 
human transcriptome data from neocortical germinal zones, we show that long intergenic non- 
coding RNA (lincRNA) expressed during human neurogenesis are selectively lost in species 
below the neuron threshold. This provides evidence for the involvement of lincRNA in not only 
regulating, but also defining the neurogenic program of mammalian species. 
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The mammalian ancestor was gyrencephalic 

We tested multiple evolutionary models for GI evolution. The model that conferred most power 
to explain GI values across the phylogeny while making the fewest assumptions about the data 
(i.e., had the lowest Akaike Information Criterion (AIC)) showed a disproportionate amount of 
evolutionary change to have occurred recently, rather than ancestrally, in mammals (Figure S2) 
and diverged significantly from a null model of stochastic evolution (Pagel] 1999 ). We identified 



a folded neocortex (GI =1.36 ± 0.16 s.e.m.) as an ancestral mammalian trait (Figure 1). It is 
apparent from ancestral and other internal node reconstructions (Figure S3) that GI is very vari- 
able, but also that reductions in the rate at which GI evolves have favored branches leading to 
decreases in GI (e.g., strepsirrhines and insectivores) and accelerations in that rate have favored 
branches leading to increases in GI (e.g., carnivores and caviomorphs). A simulation of the 
average number of total evolutionary transitions between GI values evidences more affinity for 
transitioning from high-to-low than low-to-high GI values: the majority of high-to-low transi- 
tions (58.3%) occurred in species with a GI < 1.47; and the fewest transitions (16.7%) occurred 
across a threshold value of 1.5 (Figure S4). This indicates that, although there is an evident 
trend in mammalian history to become increasingly gyrencephalic, the most variability in GI 
evolution has been concentrated among species below a certain threshold value (GI = 1.5). We 
therefore present a picture of early mammalian history, contrary to those previously painted, but 
which is gathering evidence through novel approaches ( O'Leary et al.[|20T3} Romiguier et al. 



2013), that the Jurassic-era mammalian ancestor may, indeed, have been a large-brained species 



with a folded neocortex. 



A threshold in cortical neuron number 

The evolutionary effects of a folded neocortex on the behavior and biology of a species is 
not immediately clear. We therefore analyzed associations, across the phylogeny, of GI with 
discrete character states of 37 physiological and life-history traits (Table S2). Distinct sets of 
small but significant (R 2 < 0.23, P < 0.03) associations were found for species above and 
below a GI value of 1.5, indicating that these two groups of species adapt to their environments 
differently (Figure 2a). Both groups were sampled from across the phylogeny, showing no 
phylogenetic signal. Clustering analyses also supported a bimodal distribution above and below 
a threshold value of 1.5 (Figure 2b; Figure S5). To test the bimodal distribution explicitly, 
we regressed GI values against neuroanatomical traits and found that each scaling relationship 
could be explained comparably well by either a non-linear function (Figure 3a) or two grade- 
shifted linear functions, with the best-fit linear models drawing significantly different slopes (P 
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Figure 2: Clustering of Gl values based on life-history association analysis (a) and minimum-energy distance (b). 
(a) Stochastic mapping of physiological and life-history traits with Gl values for the 102 mammalian species listed 
in Table SI. Gl values were separated into four groups based on clustering. Forty traits, each comprising 3-6 
character states, were analyzed (see Table S2 for a complete list), and the states showing a significant positive 
(P, green) or negative (N, red) association with a group of Gl values are shown. Note the major overlap between 
the two low-GI groups (10/27) and between the two high-GI groups (9/24), whereas only 3/48 character states 
are shared between Gl groups < 1.5 and > 1.5. (b) Hierarchical clustering based on minimum-energy distance 
of the Gl values for 101 mammalian species. Note that the greatest clustering height is between species with Gl 
values of < 1.5 and > 1.5. 
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= 3.4 x 1(T 4 ) for high-GI ( > 1.5) and low-GI ( < 1.5) species. (Figure 3b,c). By plotting 
GI as a function of cortical neuron number, we were able to demarcate, with two significantly 
different linear regressions for high- and low-GI species (T = 4.61 1, d.f. = 29, P = 2.8 x 10 ), 
a cortical no-man's-land centered on an area approximating 1 ±0.11 x 10 9 neurons and 1.56 
± 0.06 GI (Figure 3d). The deviation of these results from previous work, which have shown 



strong phylogenetic signals associated with both GI (Pillay and Manger, 2007 Zilles et al. 



2013] ) and neuron counts ( [Azevedo et al.[|2~0"0"9~] ), may be explained both by our more than 2-fold 



increase in sampled species and the a priori assumption of previous work that GI and neuron 
number evolve as a function of phylogeny. Variation in GI, therefore, has not evolved linearly 
across the phylogeny, but has in fact been differentially evolved in two phenotypic groups. Each 
group may be characterized not only by a high (> 1.5) or low (< 1.5) GI value, but also by a 
distinct constellation of other physiological and life-history traits which have accompanied each 
group over evolutionary time. 



More efficient neurogenesis in large-brained species 

By establishing an evolutionary threshold based on both degree of gyrencephaly and neuron 
number, we identified two neocortical phenotypic groups, which found support in their distinct 
life-history associations (see previous section). These groups could be further divorced by ac- 
counting for the amount of brain weight accumulated per gestation day - confident proxies for 
neonate brain weight and neurogenic period, respectively (Figure S6) - which we show to be, 
on average, 14-times greater in high- compared to low-GI species (Figure 4). Notably, each 
GI group is constituted by both altricial and precocial species, so the degree of pre- versus 
post-natal development is not enough to explain the discrepancy in brain weight per gestation 
day in each group. Rather, to explain the discrepancy, we introduced a deterministic model of 
cortical neurogenesis, using series summarizing seven neurogenic lineages and based on cell- 
cycle length, neuroepithelial founder pool size, neurogenic period, and estimates of relative 
progenitor-type population sizes (Table 1). We arrived at two models, based on the analysis of 
16 species, that show the highest reliability for predicting cortical neuron numbers in a range 
of species: a mouse neurogenic program, which implicates only aRG, IP, and asymmetrically 
dividing bRG; and a human neurogenic program, which additionally implicates proliferating 
progenitors in the SVZ. Each model is defined by the proportional occurrence of each lineage 
in that model (Table 2). Using the mouse neurogenic program we were able to predict neuron 
counts within 2% of the observed counts for mouse and rat, but underestimated neuron counts 
by more than 80% in high-GI species (Figure 5; Table S3). Similarly, the human neurogenic 
program predicted neuron counts within 5% for all high-GI species, but overestimated neuron 
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Figure 3: Ln-transformed plots showing Gl values as a function of brain weight (a, b, 101 species), neocortical 
volume (c, 29 species) and cortical neuron number (d, 22 species), (a) Regression analysis using one non-linear 
fit for all values (y = 0.018x 2 + 0.037x + 0.014, R 2 = 0.612, P = 6 x 10~ 5 ); (b-d) regression analyses using 
two different linear functions (b, blue line: y = 0.075x - 0.481, R 2 = 0.56, P = 4 x 10~ 5 , red line: y = 0.245x + 
0.018, R 2 = 0.73, P = 1 x KT 5 ; c, blue line: y = 0.050x - 0.194, R 2 = 0.21, P = 0.017, red line: y = 0.154x 

- 1.09, R 2 = 0.82, P = 0.004; d, blue line: y = 0.072x - 1.188, R 2 = 0.81, P = 1 x KT 4 ; red line: y = 0.140x 

- 2.370, R 2 = 0.98, P = 3 x 10~ 5 ) for species with Gl values of < 1.5 (blue triangles) and > 1.5 (red circles), 
respectively; mouse and human are indicated by green symbols. The inset in (b) shows the AIC values for models 
fitted with 1 — 5 linear slopes; note that a two-slope model best explains the data. See Table SI for data. 
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Figure 4: Brain weight per gestation day is considerably greater for high- versus low-GI species, (a) Ln-transformed 
density plot of brain weight per gestation day for 96 eutherian species listed in Table SI with Gl values of < 1.5 
(blue) and > 1.5 (red). Note the significantly different means for the two groups (dashed blue and red lines, T 
= 5.16, d.f. = 41, P = 4 x 10 _5 ).(b) Ln-transformed plot of brain weight per gestation day for 96 mammalian 
species (see a). Dashed blue line, mean value for Gl < 1.5 (-2.04 ± 0.047, s.d.); red dashed line, mean value for 
Gl > 1.5 (0.583 ± 0.050, s.d.). The colors in the index refer to species in Figure 1. See Table SI for data. 



counts by more than 150% for low-GI species. Increased proportional occurrences of the bRG 
lineage with increasing brain size was required to achieve estimates with < 5% deviation from 
observed neuron counts in all low-GI species (Table 2; Figure S7). Estimates of proportional 
occurrences in the mouse, marmoset, and rabbit are supported by previous work detailing rel- 



ative abundances of different progenitor cell-types during cortical neurogenesis (Wang et al.[ 



2011 1 Kelava et al. , 2012), [IK and WBH, in preparation]. Evolutionary gain or loss of prolif- 



erative potential in the SVZ is an essential mechanistic determinant of neocortical expansion, 
such that its presence in high-GI species and absence in low-GI species is sufficient and even 
requisite for explaining neocortical evolution (Figure S8). 



Adaptive evolution of proliferative potential in the basal germinal zone 

To simulate the adaptiveness of evolving increased proliferative potential in the SVZ in two 
lissencephalic species - mouse and marmoset - we calculated trade-offs between neuroepithe- 
lial founder pool size and neurogenic period using mouse/marmoset and human programs of 
cortical neurogenesis to achieve one billion neurons. We show that, in both species, evolving 
a lineage of proliferating basal progenitors is between 2- and 6-times more cost-efficient than 
either expanding founder pool size or lengthening neurogenesis; and that the marmoset, by 
evolving proliferating progenitors, could keep its observed founder pool size or slightly reduce 
its neurogenic period to achieve one billion neurons (Figure S9). We further clarified the signif- 
icance to neuronal output of each progenitor-type with deterministic and stochastic models of 
temporal dynamics and progenitor cell-type variables. From these we conclude that basal pro- 
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genitors are increasingly necessary in larger brains and that achieving 10 9 neurons is statistically 
implausible in the absence of proliferative basal progenitors (Table S4). Finally, we described 
the dynamics of asymmetric versus symmetric progenitors, isolated from their observed lineage 
beginning at the apical surface, by introducing three ordinary differential equations (ODEs) 
modeling a self -renewing cell that generates either a differentiated cell or proliferative cell. 
The ODEs describe a self -renewing mother progenitor, which can generate either a neuron or a 
proliferative daughter at each division. The proliferative daughter is allowed one proliferative 
division followed by self-consumption. The likelihood of a neuron or proliferative daughter 
being generated by the mother, therefore, is interdependent. We also include the pool of mother 
progenitors as a linear variable. We show that neuronal output of the system increases dramati- 
cally when both the initial pool of self -renewing cells and the likelihood of those initial cells to 
generate proliferative, rather than differentiated, cells approaches saturation (Figure S10). 



Neurodevelopmental long intergenic non-coding RNA are selectively lost in 
low-GI species 

In order to assess the degree to which morphological convergence is corroborated by conver- 
gence at the genomic level, we probed published RNA-seq data collected from human fetal 



neocortical germinal zones during mid-neurogenesis (Fietz et al. 2012). Because cis-acting 
non-coding RNAs proximal to genes differentially expressed in the human fetal brain show ac- 
celerated evolution along the human lineage, we limited our search to lincRNA. We identified 
186 lincRNA differentially expressed in at least one germinal zone or the cortical plate (Table 
S8). Of these, we shortlisted 142, which had at least one adjacent protein-coding gene expressed 
during neurogenesis. We then determined whether the genes proximal to the lincRNA in the hu- 
man genome (defined as the lincRNA gene neighborhood) were proximal to the same genomic 
sequence in 31 other species (30 mammals plus chicken). We found, firstly, that lincRNA 
gene-neighborhood conservation could not be explained by phylogenetic relatedness and that, 
secondly, gene-neighborhood conservation correlated well with GI (R 2 = 0.68, P = 4.54 x 10 ) 
and brain weight (R 2 = 0.71, P = 4.55 x 10~ 7 ), but poorly with maximum lifespan (R 2 = 0.44, P 
= 0.0004) and body weight (R 2 = 0.39, P = 0.0001). Furthermore, by calculating the number of 
lincRNA expected to be conserved in each species based on phylogenetic relatedness to human, 
we could determine which species fell below and which above null expectations. We found 
that all low-GI species (except the marmoset and manatee) fell below and all high-GI species 
above phylogenetic expectations. No such trend was found with lincRNA expressed maximally 
in adult adipose tissue (see Determining the gene-neighborhood conservation of neurodevelop- 
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Figure 5: Distinct combinations of progenitor lineages are required to predict neocortical neuron numbers for 
low- versus high-GI species, (a) Schematics of the 7 lineages used to construct neurogenic output in species, (b) 
Plotted neuronal output of the lineages in (a) beginning with two cells, over 10 divisions. Series in the legend 
summarize the neuronal output of each lineage, where n, is the number of i divisions. A constant, c = 0.989, is 
incorporated into the series for lineage 5, allowing the series to converge on the true value of the lineage output 
as the number of divisions becomes increasingly numerous, (c) Ln-transformed plot of observed neuron counts 
as a function of neurogenic period for 4 species with a Gl < 1.5 (open blue triangles) and 6 species with a Gl 
> 1.5 (open red circles). Predicted neuron counts were calculated using combinations of the lineages in (a) that 
accurately fitted to the observed neuron counts either for mouse (closed gold symbols) or human (closed green 
symbols). Note that the mouse neurogenic program implicates only lineages 1—3; the human neurogenic program 
only lineages 2—7; and that lineages 4—7 were considered interdependent, such that an increase (or decrease) in 
the occurrence of one of these lineages necessitated an attendant increase (or decrease) in the others. See Table 1 
for observed and predicted neuron counts and Table 2 for the proportional contribution of each lineage for mouse 
and human. 
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Figure 6: Gene-neighborhood conservation of 142 lincRNA expressed during human neurogenesis across 29 
mammalian species. Conservation is shown to be above null phylogenetic expectations in high-GI species (red) 
and below expectations for low-GI species (blue). The two exceptions are the marmoset, a low-GI primate 
species, and the manatee, a large-brained lissencephalic species belong to Afrotheria; both of these show lincRNA 
gene-neighborhood conservation considerably above null expectations. 

mental lincRNA). We therefore provide powerful evidence for a genomic correlate of the neuron 
threshold in the disproportionate conservation of neurodevelopmental lincRNA in high- versus 
low-GI species. 



Discussion 

The emergence of new structures, in the most general sense, is typically limited to selection on 
existing developmental processes; and conserved pathways may persist, over evolutionary time, 
even when the phenotype is transformed or unexpressed (Mayr, 1960; Shubin et al.[ 1997} Hall 



2003). However, it is also evident that development may be adapted without affecting pheno- 



type (e.g., Bolker (1994); Kalinka and Tomancak (2012)). Therefore, in order to understand 
selective pressures acting on a discontinuous or convergent trait, it is necessary to investigate 
the underlying developmental processes generating it. We have shown that a gyrencephalic neo- 
cortex is ancestral to mammals, which is concordant with evidence (Romiguie r et al.[ 2013| ) that 
the mammalian ancestor was large ( > 1kg) and long-lived ( > 25-year lifespan) and, further- 
more, provides considerable resolution to recent evidence for a gyrencephalic eutherian ancestor 
( jO'Leary et al.[|2013] ) by sampling nearly twice as many species and categorizing gyrencephaly 
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as a continuous, rather than a binary, trait. More surprisingly, we show that convergent evolu- 
tion of higher-orders of gyrencephaly along divergent lineages has been accompanied by two 
distinct constellations of physiological and life-history paradigms. Specifically, species with a 
GI > 1.5, which is commensurate with one billion cortical neurons, exhibit patterns of develop- 
ment and life-history that are distinct from species with a GI < 1.5, irrespective of phylogeny. 
This implies that there is a considerable constraint on either the ability of species of a given 
neocortical size to exploit certain ecologies or the potential for species of a given ecology to 
freely adapt neocortical size. Even marine mammals, whose selection pressures are sui generis, 
may largely be held to the same evolutionary stereotyping as terrestrial mammals (Figure Sll). 
Furthermore, no species - with the exception of the house cat (Felis catus), which may be under 



unique selection pressures due to its ten-thousand-year-old domestication (Dr iscoll et al.[|2009| ) 
- falls within the limits of the GI or neuronal threshold range (Figure 3d). While our results 
countenance previous studies showing associations between physiological and life-history traits 
in mammals (see Martin et al. ( 2005[ )), we identify those traits to have a bimodal distribution, 
rather than to vary allometrically, across species. This distribution depicts a Waddington-type 
landscape for neocortical expansion - albeit relevant at the species-level - wherein the thresh- 
old represents an adaptive peak requiring a particular adaptation in neurogenic programming 
within a population for traversal. Our results may explain this landscape by mechanistic differ- 
ences occurring during cortical neurogenesis between species above and below the threshold: 
the necessity of proliferative basal progenitors in high-GI species and their putative absence 
in low-GI species. The adaptation of proliferative basal progenitors may be tantamount to a 



relaxation of constraints along lineages leading to larger-brained species ( |Boddy et al.[|2012j ), 
however, our analysis of lincRNA gene-neighborhood conservation suggests that the selective 
loss of genomic elements regulating neurogenesis is responsible for the evolution of low-GI 
species. The loss of lincRNA in low-GI species, which are typically small-bodied (Figure 2a), 
may simply be caused by a higher rate of meiotic combination in low-GI species, resulting in 
more frequent meiotic errors and thereupon loss of lincRNA (Lewitus and Kalinka] [2013 ). But, 
firstly, this does not proscribe a functional role for lincRNA in the differential regulation of neu- 
rogenesis in low- versus high-GI species; and, secondly, both the greater predictive powers of 
GI and brain weight compared to lifespan or body weight for lincRNA gene-neighborhood con- 
servation and the more strictly phylogenetic conservation of lincRNA gene-neighborhood for 
non-neurodevelopmental lincRNA (see Neurodevelopmental long intergenic non-coding RNA 
are selectively lost in low-GI species) speak in favor of the relevance of neocortical develop- 
ment for the selective loss or retention of certain lincRNA. 

Furthermore, our human neurogenic program clearly shows that the same neurogenic lin- 
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eages in the same proportions are required to generate the neocortices of monkeys, apes, and 
humans, and may even be extended to carnivores, cetartiodactlys, and other high-GI species 
(Figure Sll), demonstrating that neurogenic period alone may be sufficient to explain differ- 
ences in neocortical size between any species in the same GI group (Figure S12). 

We propose that proliferative basal progenitors, rather than simply an abundance of asym- 
metrically dividing bRGs in an expanded SVZ, are necessary and sufficient for the evolution 
of an expanded and highly folded neocortex in mammals. Recent work in the fetal macaque 



support this proposal (Betizeau et al. 2013 1. We thus conclude that an increase in proliferative 
potential in the basal neurogenic program is an adaptive requirement for traversing the evolu- 
tionary threshold identified here. But because we reconstruct the eutherian ancestor to have a 
GI value of 1.48 ± 0.13 (s.e.m.), which falls within the range of the observed threshold, we are 
left with an ambivalent evolutionary history for mammalian neocortical expansion: either (i) 
proliferative basal progenitors are ancestral to all eutherian mammals and were selected against 
along multiple lineages (e.g., rodents, strepsirrhines), so that the ultimate loss of basal prolif- 
erative potential in certain taxa, and therefore the evolution of low-GI species, is the result of 
divergent developmental adaptations; or (ii) proliferative basal progenitors are not ancestral to 
eutherian mammals, but evolved convergently along multiple lineages, in which case the devel- 
opmental process for their inclusion in neurogenic programming may be conserved, even if that 
process was unexpressed for long stretches of mammalian evolution. While both of these his- 
tories are speculative, the prodigious conservation of lincRNA in high- versus low-GI species 
supports (i). Ultimately, we have revealed an important insight into mammalian evolution: a 
threshold exists in mammalian brain evolution; neocortical expansion beyond that threshold re- 
quires a specific class of progenitor cell-type, likely regulated by lincRNA; and the difference 
in neurogenic programming between any species on the same side of that threshold does not 
require novel progenitor-types or adaptations in progenitor-type behavior. Further experimen- 
tal research into the conservation of genomic regions regulating the expression of proliferative 
basal progenitors, either at the ventricle or through maintenance of a proliferative niche in the 
SVZ, in low- versus high-GI species may be sufficient to determine whether the mechanism for 
neocortical expansion has evolved independently in distantly related species or is the product 
of a deep homology in mammalian neurogenesis. 
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Materials and Methods 



Calculating GI 

We calculated GI using images of Nissl-stained coronal sections from http://brainmuseum.org. 
We used 10-22 sections, equally spaced along the anterior-posterior axis of the brain, for each 
species (Figure SI). The inner and outer counters of the left hemisphere were traced in Fiji 
(http://fiji.se/wiki/index.php/Fiji). The values calculated are marked with an asterisk in Table 
SI. Additional GI values were collected from the literature (Table SI; External Database 1). 



Species (e.g., platypus) whose cortical folding has been described (Goffinet, 2006 Rowe, 1990), 



but not measured according to the method established in Zilles et al. (1988 ), were omitted from 
our analyses (see Reconstructing the evolutionary history ofGI). Work in humans and baboons 
has shown that inter- indvidual variation in GI is not enough to outweigh interspecific differences 



(Rogers et al. , 2010 Toro et al. , 2008 1. 



Stochastic mapping of GI across the mammalian phylogeny 

We used a comprehensive phylogenetic approach to map 41 life-history and physiological char- 
acter traits collected from the literature (Tables S1,S2) onto hypotheses of phylogenetic rela- 
tionships in Mammalia, in order to examine how those traits correlate, over evolutionary time, 
with degree of gyrencephaly. Continuous character traits were discretized using the consen- 
sus of natural distribution breaks calculated with a Jenks-Caspall algorithm ( |Jenks and Ca spall, 
1971 1, model-based clustering according to the Schwarz criterion ( Fraley and Raftery] 2002[ ), 
and hierarchical clustering ( |Szekely and Rizzo 2005 1. Character histories were then corrected 
for body mass with a phylogenetic size correction (Collar and Wainwright, 2006) and summa- 
rized across the phylogeny using posterior probabilities. Associations between individual states 
of each character trait along those phylogenetic histories were calculated in SIMMAP (vl.5) 
using empirical priors (BollbackJ 2006); the association between any two states was a mea- 
sure of the frequency of occurrence (i.e., the amount of branch length across the tree) of those 
states on the phylogeny. The sums, rates, and types of changes for GI and body weight were 
plotted as mutational maps to assess directional biases in their evolution (Cunningham, 1999 



Huelsen beck and Rannala] |2003[ |Lewitus and Soligo[|20lT| ). The phylogeny used in this anal- 
ysis was derived from a species-level supertree (Bini nda-Emonds et al. [12007] ). We appreciate 
that the phylogenetic hypothesis reconstructed by (Meredith et al.[ 2011] ) gives notably deeper 
divergence dates for mammalian subclasses, however, not enough of our sampled species were 
included in this reconstruction for it to be useful here. 
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Reconstructing the evolutionary history of GI 



Variation in the mode and tempo of a continuous character trait is not always best characterized 
by a random walk (i.e., Brownian motion). Therefore, we compared a range of evolutionary 
models on the phylogenetic distribution of GI to find the best fit for the data ( |FelsensteTn] 1973 



Harmon et al.t^OOSyO'Meara et al.[ |2006j |Paradis et al.[|2004| ). Log-likelihood scores for each 
model were tried against the random walk score using the cumulative distribution function of 
the x 1 distribution. Maximum-likelihood ancestral character states of GI and rate-shifts in the 
evolution of GI were then constructed using the best-fit model, with the standard error and con- 
fidence intervals calculated from root node reconstruction in PDAP using independent contrasts 
dGarland and Ives[ |2000| |Garland et alj |2005| |Maddison and Maddiso"n] |2011| ). Although a 
number of putatively lissencephalic non-eutherians were unavailable for our analyses (see Cal- 
culating GI), we nonetheless reconstructed alternative ancestral GI values that included one 
hypothetical monotreme and three hypothetical marsupials (Table S5). To trace evolutionary 
changes in GI at individual nodes and along lineages, we used a two-rate mode that highlighted 
the differences in high (> 1) versus low (< 1) root-to-tip substitutions and then sampled rates 
based on posterior probabilities across the tree using a Monte Carlo Markov Chain. We as- 
sumed that transitioning between adjacent GI values had the highest likelihood of occurrence. 
The rate at a given node could then be compared to the rate at the subsequent node to deter- 
mine if a rate transition was likely. We corroborated these results using the auteur package 



(Eastman et al. 201 1 ), which calculates rate-transitions at internal nodes under the assumption 



of an Ornstein-Uhlenbeck selection model (Butler and King, 2004) over one million Monte 
Carlo sampling iterations drawn from random samplings of posterior distributions of lineage- 
specific rates. Scaling relationships were determined for GI as a function of all continuous 
life-history and physiological traits, including adult cortical neuron counts. For three insecti- 
vore (Sorex fumeus, Blarina brevicauda, Scalopus aquaticus) species, data were available for 
neuron counts but not GI, and therefore we extrapolated the GI of those species based on gross 
morphology. Finally, to test whether the bimodal distribution of GI may be influenced by the 
topology of the mammalian phylogenetic tree, we used an expectation-maximization algorithm. 
Each simulated trait was given the same variance as GI (Figure S5) and the result was averaged 
over 10 4 simulated datasets. None of the simulations produced the same bimodal distribution 
of species observed for GI data. 



Estimating neuroepithelial founder pool populations 

We estimated neuroepithelial founder pool populations for mouse and human. For the mouse, 



we used coronal sections of an El 1.5 mouse embryo obtained from the Allen Brain Atlas (Lein 
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et al. , 2007). We obtained 19 sections equidistantly spaced along the anterior-posterior axis 
of the brain. The length of the ventricular surface of the dorsal telencephalon was manually 
traced in Fiji (iSchindelin et al. 2012) on each section starting from the point above the nascent 
hippocampus and ending in the point above the lateral ganglionic eminence. The horizontal 



length of the embryonic brain at El 1 .5 was measured with images from (Bejerano et al. 2006). 
Using the coronal and horizontal measurements, we constructed a polygon representing the 
ventricular surface of the dorsal telencephalon and calculated the area of this surface in Fiji. 
We measured the surface area of the end-feet of neuroepithelial cells using EM images of the 
coronally cut apical surface of an El 1.5 embryonic brain (Table S6). The diameter of a single 
cell was calculated by measuring the distance between the adherens junctions. We corroborated 
these end-feet calculations with published immunofluorescence stainings of the apical complex 



(ZOl and N-cadherin) from an en face perspective (Bultje et al. , 2009 Marthiens and ffrench 



Constant 2009). The average surface area of a single end-foot was calculated by approximating 
the end-foot as a hexagon; and the number of founder cells was estimated by dividing the sur- 
face of the dorsal telencephalon by the surface of an individual end-foot of the neuroepithelial 
cell, such that 



Surface area(}im z 



V3 



2K{\Endfootdiameter{jim 2 ) 2 



founders 



(1) 



Our final mouse values were comparable to those previously published (Haydar et al. , 2000). 
For the human, we followed the same procedure, using 10 coronal sections and one horizontal 



section of a gestation week (GW) 9 brain (Bayer and Altman 2006). End-feet were calculated 
using EM images of the apical surface of a human brain at GW13. The measurements are 
available in Table S6. Because the number of founder cells per surface area was nearly equiv- 
alent in mouse and human (~4 x 10 5 /mm 2 ), we used this ratio, along with data on ventricular 
volume collected from the literature (Table S 1 ; Table S2; External Database 1), to estimate neu- 
roepithelial founder cell populations for a further 14 species (Table 1). For species where no 
data on ventricular volume were available, values were estimated based on a regression analysis 
against brain weight (Figure S6). Ventricular volume was then converted to surface area for each 
species by approximating the ventricle as a cylinder with a 4.5-to-l height-to-diameter propor- 
tion. Ventricular volume-derived ventricular surface area estimates were corroborated with the 
surface areas calculated from the literature for mouse and human. Founder cell estimates were 
then computed based on the densities derived above for mouse and human. Using this method, 
but alternately ignoring our mouse and human calculations to define the parameters, we were 
able to predict mouse and human values within 10% of our calculations, respectively. 
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Mathematical modeling of neurogenesis 

Workers have demonstrated the occurrence of three primary lineages of neuronal generation 



in mouse neurogenesis (Fietz and Huttner 2011) and a further four lineages in human neu- 



rogenesis (Hansen et al. , 2010). While there is evidence for at least one additional lineage in 



mouse (Noctor et al. , 2004 ), and further lineages may be speculated, we limited our model to the 



seven that are considered to contribute most significantly to neuronal output (Rakic 2009, Lui 



et al. 


2011, 


Molnar, 


2011, 


Betizeau et al. , 



of these seven lineages was summarized in series and solved numerically (Figure 5b). Neuro- 
genic period was either taken from the literature (External Database 1) or estimated based on a 
regression analysis of neurogenic period as a function of gestation period (Figure S6). Neuro- 



genic period in human was estimated using empirical observations from the literature (Bystron 



et al.[ 2006; Howard et al. 2006, Malik et al.[ 2013). The averaged cell-cycle length for apical 



and basal progenitors from the mouse (18.5 hours) was used for all non-primates (Arai et al. 



( 201 1) ); Figure S13); averaged cell-cycle length for cortical areas 17 and 18 from the macaque 



(45 hours) was used for catarrhines ( |Lukaszewicz et a l. , 2005, Be tizeau et al.[|2013| ); and an in- 
termediary cell-cycle length (30 hours), based on personal observations in marmoset, was used 
for platyrrhines. Diminishing numbers of neuroepithelial cells have been observed to continue 
to proliferate at the ventricle until El 8.5 in the mouse ( jHaubensak et al. 2004). Therefore, 
final neuroepithelial founder pool estimates were calculated from the aforementioned by evenly 



decreasing the value of a in the Sherley equation (Sherley et al. 1995) from 1 at E9.5 to 0 
at El 8.5 in the mouse and at comparable neurogenic stages in other species. Neuron numbers 
were calculated for each species from combinations of lineages. The proportional contribution 
of each lineage for each species was parameterized according to existing data on progenitor 



cell-type abundances in mouse (Wang et all 2011 ), marmoset ( Kelava et al. , 2012), rabbit [IK 



and WBH, in preparation], and macaque (Betizeau et al. 2013). Where no such data were 
available, proportional contributions were permutated for all lineages until a best-fit estimate, 



based on cortical neuron numbers taken from the literature (Azevedo et al.} 2009 ; Gabi et al. 



2010 1 Herculano-Houzel 2010[ 201 1 ), was achieved (Tables 1 ,2). Each lineage was assumed to 



occur from the first to final day of neurogenesis, although this is only approximately accurate. 



Finally, because of published estimates of postnatal apoptosis in the mammalian cortex (Burek 



and O ppenheim, 1996} |Hutchins a nd Bar gerj [T998} |Bandeira et al.} [2009 ), we assumed neuron 



counts to be 1.5-fold higher at the termination of neurogenesis than in the adult brain; therefore, 
neuron number at the termination of neurogenesis was estimated in each species by multiplying 
neuron numbers collected from the literature by 1.5. This multiplication is not represented in 
Table 1. 
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Calculating the effects of proliferative progenitors on neuronal 
output 

Trade-offs in adapting a human neurogenic program with either an expanding neuroepithelial 
founder pool or lengthening neurogenic period were tested for the mouse (Mus musculus) and 
marmoset (Callithrix jacchus), two lissencephalic species whose cell-type proportions during 
neurogenesis have been documented (No ctor et al.[|2004[|Wang et aLj|201 l^Kelava et aL||2012[ ). 



To estimate the relative reproductive value and stable-stage proportions of each of the lineages 
in the mouse and human neurogenic programs, we constructed a stage-structured Lefkovitch 
matrix, using sums of the lineage series (after 100 cycles) as fecundity values and complete 
permutations of the proportional contributions of each lineage as mortality values. The altered 
growth-rates of each lineage were calculated by excluding lineages one at a time and assuming 
100% survival in the remaining lineages (Table S4). We introduced three ODEs to explore the 
average dynamics of asymmetric versus symmetric progenitors, such that: if a(t), b(t), and c(t) 
are the numbers of asymmetrically dividing cells, differentiated cells, and proliferative cells, 
respectively, then, 

at 

db 

— = ra + 2rc (3) 
dt 

dc 

- = (l-r)a+(l-2r)c (4) 

where r is equal to growth-rate. If a(t)=duQ, then 

b{t) = 7^0 + l^- r ao)(e^ - 1) - ^t + bo (5) 



and 



c(t) = (co + l^r r *o)e {1 - 2r)t - (6) 



We calculated the effect on neuronal output of increasing the likelihood of symmetrically di- 
viding daughter progenitors in the lineage (Figure S 10). The interdependent growth-rates in the 
model reflect a purely mechanistic interpretation of determining neuronal output from a finite 
pool of asymmetrically dividing cells. The ODEs, therefore, may not reflect differential reg- 
ulation of neuronal output via direct versus indirect neurogenesis. The daughter proliferative 
cells are designed to carry out one round of proliferation followed by a final round of self- 
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consumption. 



Determining the gene-neighborhood conservation of neurodevelopmental 
lincRNA 

We used previously published RNA-seq data collected from the VZ, ISVZ, OSVZ, and cortical 



plate of human fetal neocortex at GW13-16 (Fietz et al.[ 2012 ) and employed the lincRNA dis 



covery pipeline outlined by Cabili et al. ( 201 1[ ) to identify 186 lincRNA differentially expressed 



during human neocortical neurogenesis (Table S8). Of these, 161 were differentially upregu- 
lated in a germinal zone, including 43 overexpressed in the ISVZ and/or OSVZ compared to 
the VZ (Figure S14). Previous work has shown that sequence conservation is a poor predictor 



of functional conservation in non-coding RNA (Chodroff et al. , 2010 ) and that long non-coding 



RNA are often functionally - or at least transcriptionally - linked to adjacently located protein- 
coding genes ( Ponjavic et al.[ 2009[ ). Therefore, for each lincRNA, we defined its gene neigh- 



borhood as the immediatly flanking protein-coding genes and discarded any lincRNA which 
did not have at least one flanking gene expressed during neurogenesis. The final list included 
142 lincRNA, whose gene neighborhoods were collectively enriched for Gene Ontology terms 
related to forebrain development and cell proliferation (Table S9; Figure S15). We assessed 
lincRNA gene-neighborhood conservation for the 142 lincRNA in 32 species (chicken (Gallus 
gallus; galGal4), platypus (Ornithorhyncus anatinus; ornAnal), opossum (Monodelphis domes- 
tica; monDom5), armadillo (Dasyproctus novemcinctus; dasNov3), manatee (Trichechus man- 
atus; triManl), tenrec (Echinops telfari; echTel2), elephant (Loxodonta africana; loxAfr3), little 
brown bat (Myotis lucifugus; myoLuc2), giant panda (Ailuropoda melanoleuca; aliMell), dog 
(Canis familiaris; canFam3), ferret (Mustela putorius furo; musFurl), cat (Felis catus; felCat5), 
white rhinoceros (Ceratotherium simum; cerSiml), horse (Equus caballus; equCab2), cow (Bos 
taurus; bosTau7), sheep (Ovis aries; oviAri3), dolphin (Tursiops truncatus; turTru2), alpaca 
(Vicugna pacos; vicPac2), rabbit (Oryctolagus cuniculus; oryCun2), guinea pig (Cavia porcel- 
lus; cavPor3), naked mole-rat (Heterocephalus glaber; hetGla2), rat (Rattus norvegicus; rn4), 
mouse (Mus musculus; mmlO), bushbaby (Otolemur garnettii; otoGar3), marmoset (Callithrix 
jacchus; calJac3), baboon (Papio anubis; popAnu2), rhesus (macaca mulatta; rheMac3), gib- 
bon (Nomascus leucogenys; nomLeu3), orangutan (Pongo pygmaeus abelii; ponAbe2), gorilla 
(Gorilla gorilla gorilla; gorGor3), chimpanzee (Pan troglodytes; panTro4), and human (Homo 
sapiens; hgl9)) by BLASTing the lincRNA sequence retrieved from the human RNA-seq data 
and visually inspecting the gene-neighborhood for each species. To increase the likelihood 
of finding an orthologous region in the non-human species, we used both the entire lincRNA 
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sequence, as well as, when available, only the region of the lincRNA showing signs of transcrip- 



tional activity in human as evidenced by ENCODE data on chromatin-level information (Raney 



et al. , 2011 1. Both sequences consistently identified the same region. Due to scaffolding, we 



were only able to assess a sample of the lincRNA (66) for the dolphin. After compiling the 
total conservation scores for each species (Table S9), where lincRNA gene-neighborhood could 
either be scored as conserved (1) or not conserved (0), we calculated expected scores for each 
species under an Ornstein-Uhlenbeck model based on phylogenetic generalized least squares 
(Grafen, 1989[ ). The percentage deviations of actual from expected scores for each species are 
presented in Figure 6. PhyloP and PhastCons sequence conservation scores for primates were 
computed for lincRNA not conserved in mouse. Interestingly, for both metrics, sequence con- 
servation scores were, on average, highest for gene-neighborhoods comprised by at least one 

transcription factor (Table S10; Figure S16). 

Despite a significantly stronger correlation between lincRNA gene-neighborhood conser- 
vation and GI than lifespan, it remained a possibility that slower molecular rates of evolution 
in certain species could account for high rates of lincRNA conservation. However, both high- 



and low-GI species show rates of molecular evolution below the mammalian average (Bininda- 



Emonds 2007 1 and there is no significant correlation between rate of molecular evolution and 
lincRNA gene-neighborhood conservation (R 2 = 0.001, P = 0.925). Furthermore, to deter- 
mine whether lincRNA gene-neighborhood conservation tends to be higher in high- versus low- 
GI species for non-neurodevelopment-related lincRNA, we re-ran our analyses using lincRNA 
maximally expressed in human adipose tissue from Cabili et al. (2011 1. We show that the trend 
observed for neurodevelopment-related lincRNA does not extend to adipose tissue (Figure S 17). 
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Table 1 : Parameters for models of cortical neurogenesis 0 



JfJCLlCb 


Gestation 
period (d) 


Neurogenic 
period (df 


Observed 
Neurons 


Neuroepithelial 

jUUllUcI fJUUI 

(eel Is) t 


Cell-cycle 

length 
Ihnnrs) 1 


Human 


270 


112 


1.63E+10 


3.10E+07 


45 


Gorilla 


257 


103* 


9.10E+09 


1.59E+07 


45 


Orangutan 


260 


104* 


8.90E+09 


1.16E+07 


45 


Macaque 


166 


60 


1.71E+09 


4.41 E+06 


45 


Ba boon 


1 sn 

LOU 


7*1 * 
/ Z 


-y 00F4.no 


D.O / [Z~UO 




Capuchin 


158 


59* 


1.14E+09 


2.97E+06 


45 


Owl monkey 


138 


55* 


4.42E+08 


1.05E+06 


30 


Callimico 


153 


60* 


3.57E+08 


6.92E+05 


30 


Marmoset 


146 


58 


2.45E+08 


6.71 E+05 


30 


Galago 


134 


54* 


2.26E+08 


1.01 E+06 


30 


Tupaia 


46 


19* 


6.04E+07 


5.68E+05 


18.5 


Rabbit 


30 


13 


7.15E+07 


8.08E+05 


18.5 


Agouti 


112 


45* 


1.10E+08 


9.80E+05 


18.5 


Capybara 


137 


55* 


3.10E+08 


1.78E+06 


18.5 


Rat 


21 


10 


3.10E+07 


5.40E+05 


18.5 


Mouse 


19 


9 


1.37E+07 


3.99E+05 


18.5 



°see External Database 1 
fsee Materials and Methods 

'estimate based on regression against gestation period (see Figure S6) 
11 see Materials and Methods and Figure S11. 



Table 2: Best-fit proportional occurrences (%) of lineages in different taxa' 



Taxa 


Lineage 1 


Lineage 2 


Lineage 3 


Lineage 


Catarrhines 


0 


20 


40 


40 


Capuchin 


0 


20 


40 


40 


Owl monkey 


0 


50 


50 


0 


Callimico 


0 


50 


50 


0 


Marmoset § 


0 


60 


40 


0 


Galago 


0 


75 


25 


0 


Tupaia 


10 


75 


15 


0 


Rabbit 5 


10 


75 


15 


0 


Agouti 


10 


75 


15 


0 


Capybara 


10 


75 


15 


0 


Rat 


10 


80 


10 


0 


Mouse 6 


10 


80 


10 


0 



§ supported by observational data (see Materials and Methods) 
""see Figure 5. 
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1cm 



Figure SI: Coronal section of the brain of an adult house cat (Felis catus) (obtained from www.brainmuseum.org) 
illustrating the method used to calculate Gl values as described in |Zilles et al.|p988fr . Green line, actual contour; 
magenta line, hypothetical outer contour. 
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Figure S2:Maximum-likelihood ancestral node reconstruction of Gl values at all internal nodes based on a delta 
(S = 2.635) selection model. Barplot shows the distribution of Gl values across the phylogeny; dashed red line 
indicates Gl = 1.5. 
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Figure S3: Rate-transitions in the mutation rate of Gl values along lineages of the mammalian phylogeny. (a) 
A two-mode selection model that weights low over high root-to-tip substitutions. Numbers on the branches 
indicate the change in mutation-rate compared to the previous branch; 0 values indicate no significant change, 
values > 0 indicate significant change (P < 0.05). Note the especially high rate-transitions leading to primates, 
cetartiodactyls, and cetaceans (open blue circles), (b) Mutation- and transition-rate estimates of Gl values using 
an Ornstein-Uhlenbeck selection model. Branches are colored to illustrate whether the mutation-rate estimates 
along each lineage are above (red) or below (blue) the median rate (orange); nodes are circled to indicate the 
posterior support of a transition-rate-shift event. The gradient of colors (see key) indicates the degree of deviation 
of the mutation-rate estimates (branches) and transition-rate estimates (nodes) from the median, with the highest 
deviation being arbitrarily set to ± 1.0 and the median to 0.0; the size of the circles (see key) at the nodes indicates 
the degree of posterior support for a transition-rate-shift event, with the highest value being arbitrarily set to 1.0 
and lack of support to 0.0. Note that simians have evolved Gl values at a rate consistent with the mammalian 
median. 
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Figure S4: Barplots of types of transitions over mammalian evolution between four Gl groups (see Figure 2a) 
and between five body mass groups averaged over 10 5 simulations. The number of total transitions from one Gl 
or body mass group to another is summed as either high-to-low or low-to-high transitions. Note that significantly 
more high-to-low than low-to-high transitions are observed for Gl, but that no significant difference in type of 
transition is observed for body mass. 
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Figure S5: The bimodal distribution of Gl values across the phylogeny is non-random. A histogram showing the 
frequency of occurrence of Gl values, binned at 0.05 intervals, for the 102 mammalian species listed in Table SI. 
Blue, Gl values < 1.5; red, Gl values > 1.5. The bimodal distribution of Gl values shows a natural break at Gl 
= 1.5, which is supported by energy-based hierarchical clustering (see Figure 2b). Note the possibility for a third 
Gl group (Gl > 3, tomato red), constituting cetaceans and elephant; however, we have too few sampled species 
from these orders to assess the group decisively (see Figure Sll). 
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Figure S6: Ln-transformed plots of neonate brain weight (a) and ventricular volume (b) as functions of adult 
brain weight, neurogenic period as a function of gestation period (c); and a plot of neuroepithelial founder cells as 
a function of ventricular surface area (d). (a) Neonate brain weight scales linearly with adult brain weight for 52 
eutherian species (y = 1.09x - 1.49, R 2 = 0.92, P = 6 x 10~ 7 ). (b) Ventricular volume scales linearly with adult 
brain weight for 30 eutherian species (y = 0.93x + 2.37, R 2 = 0.93, P = 9 x 10~ 8 ). (c) Neurogenic period scales 
linearly with gestation period for a sample of six species (y = 0.91x - 0.42, R 2 = 0.94, P = 0.0002), spanning two 
mammalian superorders. Predicted neurogenic period is shown for human, (d) Ventricular surface area, converted 
from ventricular volume (see Methods), scales linearly with our estimated neuroepithelial founder populations (y 
= 6.7 x 10 5 + 878x, R 2 = 0.94, P = 5 x 10~ 8 ). (a, c) Note that these plots demonstrate the strong predictive 
powers of adult brain weight and gestation period for neonate brain weight and neurogenic period, respectively, 
validating the assumptions made in Figure 4. 
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Figure S7: Stacked barplot, for the indicated species, of deviations between the observed neocortical neuron counts 
and the ones predicted based on human (red), mouse (blue) and marmoset (yellow) neurogenic programs (see 
Table 2 and Figure 5). For each species, deviations were calculated as 100*((Predicted - Observed)/Observed) 
and then divided by the sum of deviations obtained for all three programs. Predictions based on the marmoset 
program deviate from observed neuron counts not only for the 6 species with a Gl value > 1.5 (red text), but also 
for 8 of the 10 species with a Gl < 1.5 (blue text), indicating a necessity for differential proportional occurrences 
of bRG in low-GI species. It is worth noting that natural intraspecific variation in neocortical neuron number has 
been shown to be considerably less than interspecific variation ( |Collins et al.| [2~010| |Young et a l~j 1 20 1 3 p - 
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Figure S8: Plot of observed neocortical neuron number (red circles) as a function of neurogenic period for six 
species with a Gl value > 1.5. Predicted neuron numbers are presented for the human neurogenic program (green 
circles; see Figure 5, Table 2) and for two further lineages, each of which is assumed to have a 100% proportional 
occurrence: direct neurogenesis from bRG (blue circle) and indirect neurogenesis from bRG via a self-consuming 
IP cell (orange circle). Note that indirect neurogenesis from bRG via IPs is nearly sufficient to achieve the observed 
neuronal count in the Capuchin monkey, but that 100% occurence of this lineage is unrealistic. 
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Figure S9: Calculating the adaptiveness of proliferative basal progenitors in mouse (a— e) and marmoset (f— j) in 
achieving 10 9 neurons with respect to lengthening neurogenic period and expanding neuroepithelial founder pool 
size. The fold-change of lengthening neurogenic period or expanding neuroepithelial founder pool size is indicated 
in each relevant plot, (a) The observed neurogenic period and founder pool size in mouse generates 1.37 x 10 7 
neurons using the mouse neurogenic program, (b, c) Lengthening the neurogenic period (b) or expanding the 
founder pool size (c) using the mouse program to achieve 10 s neurons, (d, e) Lengthening the neurogenic period 
(d) or expanding the founder pool size (e) using the human neurogenic program combination to achieve 10' 
neurons, (f) The observed neurogenic period and founder pool size in marmoset generates 2.45 x 10 s neurons 
using the marmoset neurogenic program, (g, h) Lengthening the neurogenic period (g) or expanding the founder 
pool size (h) using the marmoset program to achieve 10 9 neurons, (i, j) Lengthening the neurogenic period (i) 
or expanding the founder pool size (j) using the human neurogenic program to achieve 10 9 neurons. 
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Figure S10: Neuronal outputs from solutions to ODEs describing direct versus indirect neurogenesis for growth- 
rate values < 0.5. Contour plot of neuronal densities for a varying initial asymmetrically dividing cell population 
(a) and likelihood of direct (r = 1) versus indirect (r = 0) neurogenesis. Note that neuronal output increases 
maximally when both the initial cell pool increases (a — > 100) and the likelihood of indirect neurogenesis increases 
(r-> 0). 
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Figure Sll: Neocortical development in marine mammals may be largely explained by the same neurogenic 
program as terrestrial mammals, (a) Observed neocortical neuron numbers for human, four cetacean species, and 
one marine carnivore (Harp seal) are shown beside neuron numbers calculated from the human (red) and mouse 
(blue) neurogenic programs (see Text). Asterisks denote neuron numbers that are significantly different (T > 7, 
P < 0.05) from the observed, (b) The number of neurons generated per neurogenic day (green) and per body 
weight (gold) in the same six species. Note that the Bottlenose dolphin is the only species for which the human 
program is not sufficient to achieve its observed number of neurons; and, although the fin whale generates more 
neurons per neurogenic day, the human program produces a higher neuron count due to the fin whale's large 
estimated founder pool. See Table S7 for data. 
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Figure S12: Neocortical complexity, represented here as cortical gyrification, is tightly linked to progenitor behavior 
in the OSVZ. The nature of the link, however, is such that incremental changes to OSVZ progenitor behavior 
(inner ring) may effect exponential changes in neocortical complexity (outer ring). Therefore, only minor changes 
in the proliferative capacity of basal progenitors (yellow arrow, inner ring) is needed to distinguish the major 
differences in neocortical complexity (yellow arrow, outer ring) between the macaque and human. It remains to 
be shown whether shifts in the proliferative capacity of OSVZ progenitors and neocortical complexity can occur 
independently (i.e., whether the arrow can be bent). Pictured clockwise: mouse, capybara, ferret, macaque, 
human. 
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Figure S13: Cell-cycle dynamics of progenitors in non-primates, (a) Cell-cycle for Tis2\± apical and basal 
progenitors at different stages of neurogenesis from live-imaging studies performed in the mouse ( |Arai et al.| 
|2011| |. (b) Barplot of the observed number of neurons in the neocortex of five rodents and a sister species to 
primates compared to the number of neurons predicted using a fixed cell-cycle of 18.5 hours, as was done in 
Figure 5, and the number of neurons predicted using dynamic cell-cycles for each progenitor as shown in (a). 
Note that for all species the predictions based on fixed and dynamic cell-cycles deviate by < 1%. The percentage 
deviations between observed and mouse neurogenic program-predicted neuron numbers are listed in Table S3. 
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Figure S14: Germinal zone-specific transcript levels for lincRNA extracted from GW13-16 human RNA-seq. 
LincRNA differentially overexpressed in the (a) ISVZ and (b) OSVZ. (c) Mean and mode (inset) transcript levels 
for all lincRNA not differentially overexpressed in the cortical plate. See Table S8. 
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Figure S15: LincRNAs expressed during human neurogenesis tend to have gene-adjacent neighbors involved in 
neocortical development. Shown are fold-enrichments of Gene Ontology (GO) terms for adjacent protein-coding 
gene neighbors of the 142 lincRNA expressed during human neurogenesis (see Table S9). GO terms are listed if 
they are over-represented in the protein-coding gene set (P < 10~ 2 ). Fold differences for enriched GO terms were 
analyzed using DAVID (http://david.abcc.ncifcrf.gov/summary.jsp). 



44 



Downloaded from http://biorxiv.org/ on September 18, 2014 



Conservation score 



0) < 

cn Z 

< f, 

Is 



E z 

r ~ 



= 5 



¥%ft>> + 
"i'vb; 

i-i r; 

=11 



■>:vb; 



... 



9J00S SUOO;SBL)d 



ajODS doi^nd 



Figure S16: Sequence conservation among primates for lincRNAs expressed during human neurogenesis tends to 
be higher for lincRNAs flanked by at least one transcription factor, (a) The PhastCons score for 62 lincRNAs 
expressed during human neurogenesis, whose gene neighborhoods are not conserved in mouse, (b) PhyloP scores 
for the lincRNA in (a), (c) Boxplot of the mean and median PhyloP and PhastCons scores for primates for the 
lincRNA shown in (a,b). Mean differences for PhastCons (T = -2.371, P = .0227) and PhyloP (T = -3.513, P 
= 0.0009) were significantly different, but median differences were significant only for PhyloP (T = -5.211, P = 
2.354e-6), not PhastCons (T = -0.706, P = 0.4847). 



Downloaded from http://biorxiv.org/ on September 18, 2014 



100 - 




Figure S17: (a) Gene-neighborhood conservation for neurodevelopmental lincRNA (lilac) and lincRNA maximally 
expressed in human adipose tissue (forest green) for three high-GI (red) and three low-GI (blue) species. Conser- 
vation in the naked mole rat, rabbit, and elephant are significantly different (P < 0.05) for neurodevelopmental 
compared to adipose lincRNA, while similar levels of conservation are observed for both classes of lincRNA in 
rhesus monkey, mouse, and horse, (b) Density plot of conservation values in (a). The 95% confidence intervals 
are shown (dashed gray lines) for the peak distribution of values under a null model of evolution (i.e., an expec- 
tation model based on phylogenetic relatedness). Note that the peak distribution of adipose lincRNA, but not 
neurodevelopment lincRNA conservation, falls within these intervals. 
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